Search CORE

40 research outputs found

Capacity Allocation for Clouds with Parallel Processing, Batch Arrivals, and Heterogeneous Service Requirements

Author: Beck J. Christopher
Bergsma Shane
Furman Eugene
Senderovich Arik
Publication venue
Publication date: 20/09/2022
Field of study

Problem Definition: Allocating sufficient capacity to cloud services is a challenging task, especially when demand is time-varying, heterogeneous, contains batches, and requires multiple types of resources for processing. In this setting, providers decide whether to reserve portions of their capacity to individual job classes or to offer it in a flexible manner. Methodology/results: In collaboration with Huawei Cloud, a worldwide provider of cloud services, we propose a heuristic policy that allocates multiple types of resources to jobs and also satisfies their pre-specified service level agreements (SLAs). We model the system as a multi-class queueing network with parallel processing and multiple types of resources, where arrivals (i.e., virtual machines and containers) follow time-varying patterns and require at least one unit of each resource for processing. While virtual machines leave if they are not served immediately, containers can join a queue. We introduce a diffusion approximation of the offered load of such system and investigate its fidelity as compared to the observed data. Then, we develop a heuristic approach that leverages this approximation to determine capacity levels that satisfy probabilistic SLAs in the system with fully flexible servers. Managerial Implications: Using a data set of cloud computing requests over a representative 8-day period from Huawei Cloud, we show that our heuristic policy results in a 20% capacity reduction and better service quality as compared to a benchmark that reserves resources. In addition, we show that the system utilization induced by our policy is superior to the benchmark, i.e., it implies less idling of resources in most instances. Thus, our approach enables cloud operators to both reduce costs and achieve better performance

arXiv.org e-Print Archive

A Comparison of Retrieval Models using Term Dependencies

Author: Bergsma Shane
Croft W Bruce
Huston Samuel
Risvik Knut Magne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Whose Tweets are Surveilled for the Police: An Audit of Social-Media Monitoring Tool via Log Files

Author: An Jisun
Beckett Katherine
Bergsma Shane
Buolamwini Joy
Cesare Nina
Chang Jonathan
Ensign Danielle
Klein Daniel
Preotiuc-Pietro Daniel
Richard Landis J.
Sandvig Christian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/01/2020
Field of study

Social media monitoring by law enforcement is becoming commonplace, but little is known about what software packages for it do. Through public records requests, we obtained log files from the Corvallis (Oregon) Police Department's use of social media monitoring software called DigitalStakeout. These log files include the results of proprietary searches by DigitalStakeout that were running over a period of 13 months and include 7240 social media posts. In this paper, we focus on the Tweets logged in this data and consider the racial and ethnic identity (through manual coding) of the users that are therein flagged by DigitalStakeout. We observe differences in the demographics of the users whose Tweets are flagged by DigitalStakeout compared to the demographics of the Twitter users in the region, however, our sample size is too small to determine significance. Further, the demographics of the Twitter users in the region do not seem to reflect that of the residents of the region, with an apparent higher representation of Black and Hispanic people. We also reconstruct the keywords related to a Narcotics report set up by DigitalStakeout for the Corvallis Police Department and find that these keywords flag Tweets unrelated to narcotics or flag Tweets related to marijuana, a drug that is legal for recreational use in Oregon. Almost all of the keywords have a common meaning unrelated to narcotics (e.g.\ broken, snow, hop, high) that call into question the utility that such a keyword based search could have to law enforcement.Comment: 21 Pages, 2 figures. To to be Published in FAT* 2020 Proceeding

arXiv.org e-Print Archive

Crossref

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

Author: Alzahrani Sultan
Bergsma Shane
Bethlehem Jelke G
Buolamwini Joy
Chen Xin
Ciot Morgane
Compton Ryan
Goot Rob
Goswami Sumit
Hecht Brent
Huang Gao
Jung Soon-Gyo
Kim Yoon
McCorriston James
Mislove Alan
Nguyen Dong
Nguyen Dong
Rosenthal Sara
Sap Maarten
Schler Jonathan
Zamal Faiyaz Al
Zhang Jinxue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of statistical inference tools towards dominant languages and groups. While demographic attribute inference could be used to mitigate such bias, current techniques are almost entirely monolingual and fail to work in a global environment. We address these challenges by combining multilingual demographic inference with post-stratification to create a more representative population sample. To learn demographic attributes, we create a new multimodal deep neural architecture for joint classification of age, gender, and organization-status of social media users that operates in 32 languages. This method substantially outperforms current state of the art while also reducing algorithmic bias. To correct for sampling biases, we propose fully interpretable multilevel regression methods that estimate inclusion probabilities from inferred joint population counts and ground-truth population counts. In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web Conference (WWW '19

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Universaar

Acronym

Measuring, Understanding, and Classifying News Media Sympathy on Twitter after Crisis Events

Author: Abbar Sofiane
An Jisun
Bergsma Shane
Cha Meeyoung
Diakopoulos Nicholas
dos Reis Júlio Cesar
Goldberg Yoav
Hansen Lars Kai
Kim Yonghwan
Kwak Haewoon
Levenshtein V. I.
Lotan Gilad
Lui Marco
Mejova Yelena
Mikolov Tomas
Price Vincent
Schulz Axel
Shoemaker P.J.
Singer J.
Vargas Saúl
Zeiler Matthew D.
Publication venue
Publication date: 15/03/2018
Field of study

This paper investigates bias in coverage between Western and Arab media on Twitter after the November 2015 Beirut and Paris terror attacks. Using two Twitter datasets covering each attack, we investigate how Western and Arab media differed in coverage bias, sympathy bias, and resulting information propagation. We crowdsourced sympathy and sentiment labels for 2,390 tweets across four languages (English, Arabic, French, German), built a regression model to characterize sympathy, and thereafter trained a deep convolutional neural network to predict sympathy. Key findings show: (a) both events were disproportionately covered (b) Western media exhibited less sympathy, where each media coverage was more sympathetic towards the country affected in their respective region (c) Sympathy predictions supported ground truth analysis that Western media was less sympathetic than Arab media (d) Sympathetic tweets do not spread any further. We discuss our results in light of global news flow, Twitter affordances, and public perception impact.Comment: In Proc. CHI 2018 Papers program. Please cite: El Ali, A., Stratmann, T., Park, S., Sch\"oning, J., Heuten, W. & Boll, S. (2018). Measuring, Understanding, and Classifying News Media Sympathy on Twitter after Crisis Events. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA. DOI: https://doi.org/10.1145/3173574.317413

arXiv.org e-Print Archive

Crossref

CWI's Institutional Repository

Bootstrapping path-based pronoun resolution

Author: Shane Bergsma
Publication venue
Publication date: 01/01/2006
Field of study

We present an approach to pronoun resolution based on syntactic paths. Through a simple bootstrapping procedure, we learn the likelihood of coreference between a pronoun and a candidate noun based on the path in the parse tree between the two entities. This path information enables us to handle previously challenging resolution instances, and also robustly addresses traditional syntactic coreference constraints. Highly coreferent paths also allow mining of precise probabilistic gender/number information. We combine statistical knowledge with well known features in a Support Vector Machine pronoun resolution classifier. Significant gains in performance are observed on several datasets.

CiteSeerX

University of Alberta CORPUS-BASED LEARNING FOR PRONOMINAL ANAPHORA RESOLUTION

Author: Degree Master Of Science
Shane Anthony Bergsma
Shane Anthony Bergsma
Publication venue
Publication date
Field of study

Permission is hereby granted to the University of Alberta Library to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatever without the author’s prior written permission. Date

CiteSeerX

Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

Author: Colin Cherry
Shane Bergsma
Publication venue
Publication date: 01/07/2011
Field of study

Graph-based dependency parsing can be sped up significantly if implausible arcs are eliminated from the search-space before parsing begins. State-of-the-art methods for arc filtering use separate classifiers to make pointwise decisions about the tree; they label tokens with roles such as root, leaf, or attaches-tothe-left, and then filter arcs accordingly. Because these classifiers overlap substantially in their filtering consequences, we propose to train them jointly, so that each classifier can focus on the gaps of the others. We integrate the various pointwise decisions as latent variables in a single arc-level SVM classifier

CiteSeerX

NRC Publications Archive